Body Language Analysis

The aim is to analyze the body gestures of Japanese speakers when speaking in Japanese and English and to

  1. see if there is a difference in the body gestures between the two languages
  2. see if there is a difference in the body gestures depending on the speaker's fluency of English
  3. create a classifier to distinguish the spoken language or English fluency from the speaker's body gestures
  • The history of studies on language analysis dates back to 1950's, when Albert Mehrabian, a pioneer researcher of body language, revealed that the total impact of a message is 7% verbal (words only), 38% vocal (including tone of voice, inflection andother sounds) and 55% percent nonverbal. Anthropologist Ray Birdwhistell also created the fied of research called "kinesics", whihc is a study of non verbal communication. (Pease, A., & Pease, B. (2004). The Definitive Book of Body Language. Bantam Dell.)

  • In general, there are mainly three types of body gestures: metaphorics, iconics (both kinetographs and pictographs) and deictics. An iconic getsture depicts aspects of an object or action, and are usually does not require the accompanying of speech. A metaphoric gesture is special kind of iconic gesture that represents an abstract idea. Deictic gestures refer to objects or locations in physical or conceptual space, and conveys aspects of a speaker’s meaning that are difficult to express in words.

  • Adams (1998) found that native speakers of English produced more deictic and iconic gestures when retelling a narrative to L2 English learners than to English-speaking interlocutors (Adams, T. W. (1998). Gesture in foreigner talk. University of Pennsylvania, Philadelphia, PA. Retrieved from http://repository.upenn.edu/dissertations/AAI9829850/)

  • Gullberg (2003) found that L2 learners are much more likely to produce iconic gestures when referring to entities that they have previously mentioned when speaking the target language than their native language (Gullberg, M. (2003). Gestures, referents, and anaphoric linkage in learner varieties. Information structure and the dynamics of language acquisition. John Benjamins Publishing Company.)

  • Hayashi and Mori (2006) explains how one study revealed that L2 learners frequently gestured when they were unable to complete an utterance in Japanese, which at the same time encourages the Japanese interlocutors to suggest appropriate completions for their sentences (Mori, J., & Hayashi, M. (2006). The achievement of intersubjectivity through embodied completions: A study of interactions between first and second language speakers. Applied Linguistics, 27(2), 195 –219. doi:10.1093/applin/aml014)

  • Morett et al. (2012) showed through their experiments that interlocutors use iconic gesture to facilitate communication in a novel second language (Morett, L.M., Gibbs, R.W. & MacWhinney, B (2012). The role of gesture in second language learning: Communication, acquisition, and retention. Proceedings of CogSci)

  • Gullberg explains in her paper how Kida (2005) reported that, like the learning of the language itself, gestures of an L2 learner develops. (the original paper by Kida was,unfortunately, written in French.) Kida examined Japanese learners of French residing in France, while carefully noting the role played by the gestural properties of the source (Japanese) and target (French) cultures, of the situation and context of a particular type of interaction, as well as individual preferences. He states that the gestural development is "not linear but rather complex and multi-layered". (Kida, Tsuyoshi (2005). Appropriation du Geste par les Étrangers: Le Cas d’Étudiants Japonais Apprenant le Français. Aix en Provence: Laboratoire Parole et Langage.) (Gullberg, M. Some reasons for studying gesture and second language acquisition (Hommage à Adam Kendon))

The studies above suggest that the subjects, all of whom are L2 speakers of English, would generally use more gestures when they cannnot come up with the right vocabulary to express their feelings, and are also likely to use more gestures than they do in Japanese since it is a second language.

Hypthesis

When a Japanese person speaks in English, the frequency and the size of body gestures increase compared to when they speak in Japanese. If the speaker is fluent in English, the differences of body gesture between English and Japanese are not so significant compared to speakers who are unconfident with their English. In other words, fluent English speakers will have just as much gestures in Japanese as they do in English.

Experiment

The data was collected by asking 6 subjects to have a conversation for 3 minutes each in English and Japanese.

All the subjects were Japanese students at KMD. The initial plan was to experiment with diverse races and nationalities, but body languages may depend on the following three variables.

  1. the language being spoken
  2. whether you are a native / non-native speaker of the language
  3. nationality (e.g. Italians may have bigger body language in general?)

In order to reduce the number of variables into one, I chose the language being spoken as the variable and fixed the other two elements, which meant that the subjects should be limited to Japanese students whose mother tongue is Japanese.

The details of the subjects are as follows.

  • 2 students who have been learning English for 10 years since junior high but are very unconfident with their English and have never/hardly been out of Japan(English level: LOW)
  • 2 students who have travelled abroad and are motivated to communicate in English but are still slightly unconfident with their English (English level: INTERMEDIATE)
  • 2 students who have lived in an English-speaking country for more than 5 years and are considered bilingual (English level: high)

All 6 subjects were native Japanese speakers who have no problem in communicating with Japanese.

In addition to the 6 subjects, I also collected data from one more subject as the "test data" for the classifier that will be created from the 6 subjects. This subject has also lived in an English-speaking country for more than 5 years, and is considered bilingual.

The Japanese conversation was always started off with the topic "Favourite countries/cities the subjects have visited", and the English conversation was always started off with the topic "Description of the subjects' mother". The subjects spoke facing myself with a table in between.

The acclerometer was attached to the subject's good hand's wrist (y axis pointing towards the fingers, z axis pointing upwards from the back of the hand), and the gyroscope was attached to the subject's head with a head band (y axis pointing upwards from the head, x axis pointing towards the right ear). The data is raw sensor data (not calibrated). The data structure of the file is shown in the header of the file.

Below are the images of the subjects during the experiment.

Importing the Data


In [1]:
import matplotlib.pyplot as plt
from pandas import read_csv
%pylab inline

dohi_jgx = read_csv('data/jap_english/dohi/japanese/gx.txt',delimiter=',', names=['date','gx'])
dohi_jgy = read_csv('data/jap_english/dohi/japanese/gy.txt',delimiter=',', names=['date','gy'])
dohi_jgz = read_csv('data/jap_english/dohi/japanese/gz.txt',delimiter=',', names=['date','gz'])
dohi_ja = read_csv('data/jap_english/dohi/japanese/a.txt',delimiter=',',skiprows=1,names=['ax','ay','az'])

dohi_egx = read_csv('data/jap_english/dohi/english/gx.txt',delimiter=',', names=['date','gx'])
dohi_egy = read_csv('data/jap_english/dohi/english/gy.txt',delimiter=',', names=['date','gy'])
dohi_egz = read_csv('data/jap_english/dohi/english/gz.txt',delimiter=',', names=['date','gz'])
dohi_ea = read_csv('data/jap_english/dohi/english/a.txt',delimiter=',',skiprows=1,names=['ax','ay','az'])

kohei_jgx = read_csv('data/jap_english/kohei/japanese/gx.txt',delimiter=',', names=['date','gx'])
kohei_jgy = read_csv('data/jap_english/kohei/japanese/gy.txt',delimiter=',', names=['date','gy'])
kohei_jgz = read_csv('data/jap_english/kohei/japanese/gz.txt',delimiter=',', names=['date','gz'])
kohei_ja = read_csv('data/jap_english/kohei/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

kohei_egx = read_csv('data/jap_english/kohei/english/gx.txt',delimiter=',', names=['date','gx'])
kohei_egy = read_csv('data/jap_english/kohei/english/gy.txt',delimiter=',', names=['date','gy'])
kohei_egz = read_csv('data/jap_english/kohei/english/gz.txt',delimiter=',', names=['date','gz'])
kohei_ea = read_csv('data/jap_english/kohei/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

nure_jgx = read_csv('data/jap_english/nure/japanese/gx.txt',delimiter=',', names=['date','gx'])
nure_jgy = read_csv('data/jap_english/nure/japanese/gy.txt',delimiter=',', names=['date','gy'])
nure_jgz = read_csv('data/jap_english/nure/japanese/gz.txt',delimiter=',', names=['date','gz'])
nure_ja = read_csv('data/jap_english/nure/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

nure_egx = read_csv('data/jap_english/nure/english/gx.txt',delimiter=',', names=['date','gx'])
nure_egy = read_csv('data/jap_english/nure/english/gy.txt',delimiter=',', names=['date','gy'])
nure_egz = read_csv('data/jap_english/nure/english/gz.txt',delimiter=',', names=['date','gz'])
nure_ea = read_csv('data/jap_english/nure/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

yamana_jgx = read_csv('data/jap_english/yamana/japanese/gx.txt',delimiter=',', names=['date','gx'])
yamana_jgy = read_csv('data/jap_english/yamana/japanese/gy.txt',delimiter=',', names=['date','gy'])
yamana_jgz = read_csv('data/jap_english/yamana/japanese/gz.txt',delimiter=',', names=['date','gz'])
yamana_ja = read_csv('data/jap_english/yamana/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

yamana_egx = read_csv('data/jap_english/yamana/english/gx.txt',delimiter=',', names=['date','gx'])
yamana_egy = read_csv('data/jap_english/yamana/english/gy.txt',delimiter=',', names=['date','gy'])
yamana_egz = read_csv('data/jap_english/yamana/english/gz.txt',delimiter=',', names=['date','gz'])
yamana_ea = read_csv('data/jap_english/yamana/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

toshi_jgx = read_csv('data/jap_english/toshi/japanese/gx.txt',delimiter=',', names=['date','gx'])
toshi_jgy = read_csv('data/jap_english/toshi/japanese/gy.txt',delimiter=',', names=['date','gy'])
toshi_jgz = read_csv('data/jap_english/toshi/japanese/gz.txt',delimiter=',', names=['date','gz'])
toshi_ja = read_csv('data/jap_english/toshi/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

toshi_egx = read_csv('data/jap_english/toshi/english/gx.txt',delimiter=',', names=['date','gx'])
toshi_egy = read_csv('data/jap_english/toshi/english/gy.txt',delimiter=',', names=['date','gy'])
toshi_egz = read_csv('data/jap_english/toshi/english/gz.txt',delimiter=',', names=['date','gz'])
toshi_ea = read_csv('data/jap_english/toshi/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

yukita_jgx = read_csv('data/jap_english/yukita/japanese/gx.txt',delimiter=',', names=['date','gx'])
yukita_jgy = read_csv('data/jap_english/yukita/japanese/gy.txt',delimiter=',', names=['date','gy'])
yukita_jgz = read_csv('data/jap_english/yukita/japanese/gz.txt',delimiter=',', names=['date','gz'])
yukita_ja = read_csv('data/jap_english/yukita/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

yukita_egx = read_csv('data/jap_english/yukita/english/gx.txt',delimiter=',', names=['date','gx'])
yukita_egy = read_csv('data/jap_english/yukita/english/gy.txt',delimiter=',', names=['date','gy'])
yukita_egz = read_csv('data/jap_english/yukita/english/gz.txt',delimiter=',', names=['date','gz'])
yukita_ea = read_csv('data/jap_english/yukita/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

miyuki_jgx = read_csv('data/jap_english/miyuki/japanese/gx.txt',delimiter=',', names=['date','gx'])
miyuki_jgy = read_csv('data/jap_english/miyuki/japanese/gy.txt',delimiter=',', names=['date','gy'])
miyuki_jgz = read_csv('data/jap_english/miyuki/japanese/gz.txt',delimiter=',', names=['date','gz'])
miyuki_ja = read_csv('data/jap_english/miyuki/japanese/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])

miyuki_egx = read_csv('data/jap_english/miyuki/english/gx.txt',delimiter=',', names=['date','gx'])
miyuki_egy = read_csv('data/jap_english/miyuki/english/gy.txt',delimiter=',', names=['date','gy'])
miyuki_egz = read_csv('data/jap_english/miyuki/english/gz.txt',delimiter=',', names=['date','gz'])
miyuki_ea = read_csv('data/jap_english/miyuki/english/a_cut.txt',delimiter='\t',names=['date', 'ax','ay','az'])


Populating the interactive namespace from numpy and matplotlib

Merging the gyro data and getting the absolute value


In [2]:
import pandas as pd
dohi_jgxy = pd.merge(dohi_jgx, dohi_jgy)
dohi_jgxy.head()


Out[2]:
date gx gy
0 2015/06/03 22:47:50.024 0.053549 0.027121
1 2015/06/03 22:47:50.118 0.042817 0.006576
2 2015/06/03 22:47:50.211 0.020434 0.027858
3 2015/06/03 22:47:50.304 0.086043 -0.015291
4 2015/06/03 22:47:50.398 0.082829 0.037101

In [3]:
dohi_jg = pd.merge(dohi_jgxy, dohi_jgz)
dohi_jg['gx'] = dohi_jg['gx'].abs()
dohi_jg['gy'] = dohi_jg['gy'].abs()
dohi_jg['gz'] = dohi_jg['gz'].abs()

In [5]:
dohi_jg.head()


Out[5]:
date gx gy gz
0 2015/06/03 22:47:50.024 0.053549 0.027121 0.018867
1 2015/06/03 22:47:50.118 0.042817 0.006576 0.037102
2 2015/06/03 22:47:50.211 0.020434 0.027858 0.014675
3 2015/06/03 22:47:50.304 0.086043 0.015291 0.043593
4 2015/06/03 22:47:50.398 0.082829 0.037101 0.039177

In [4]:
nure_jgxy = pd.merge(nure_jgx, nure_jgy)
nure_jg = pd.merge(nure_jgxy, nure_jgz)
nure_jg['gx'] = nure_jg['gx'].abs()
nure_jg['gy'] = nure_jg['gy'].abs()
nure_jg['gz'] = nure_jg['gz'].abs()

kohei_jgxy = pd.merge(kohei_jgx, kohei_jgy)
kohei_jg = pd.merge(kohei_jgxy, kohei_jgz)
kohei_jg['gx'] = kohei_jg['gx'].abs()
kohei_jg['gy'] = kohei_jg['gy'].abs()
kohei_jg['gz'] = kohei_jg['gz'].abs()

yamana_jgxy = pd.merge(yamana_jgx, yamana_jgy)
yamana_jg = pd.merge(yamana_jgxy, yamana_jgz)
yamana_jg['gx'] = yamana_jg['gx'].abs()
yamana_jg['gy'] = yamana_jg['gy'].abs()
yamana_jg['gz'] = yamana_jg['gz'].abs()

toshi_jgxy = pd.merge(toshi_jgx, toshi_jgy)
toshi_jg = pd.merge(toshi_jgxy, toshi_jgz)
toshi_jg['gx'] = toshi_jg['gx'].abs()
toshi_jg['gy'] = toshi_jg['gy'].abs()
toshi_jg['gz'] = toshi_jg['gz'].abs()

yukita_jgxy = pd.merge(yukita_jgx, yukita_jgy)
yukita_jg = pd.merge(yukita_jgxy, yukita_jgz)
yukita_jg['gx'] = yukita_jg['gx'].abs()
yukita_jg['gy'] = yukita_jg['gy'].abs()
yukita_jg['gz'] = yukita_jg['gz'].abs()

miyuki_jgxy = pd.merge(miyuki_jgx, miyuki_jgy)
miyuki_jg = pd.merge(miyuki_jgxy, miyuki_jgz)
miyuki_jg['gx'] = miyuki_jg['gx'].abs()
miyuki_jg['gy'] = miyuki_jg['gy'].abs()
miyuki_jg['gz'] = miyuki_jg['gz'].abs()

dohi_egxy = pd.merge(dohi_egx, dohi_egy)
dohi_eg = pd.merge(dohi_egxy, dohi_egz)
dohi_eg['gx'] = dohi_eg['gx'].abs()
dohi_eg['gy'] = dohi_eg['gy'].abs()
dohi_eg['gz'] = dohi_eg['gz'].abs()

nure_egxy = pd.merge(nure_egx, nure_egy)
nure_eg = pd.merge(nure_egxy, nure_egz)
nure_eg['gx'] = nure_eg['gx'].abs()
nure_eg['gy'] = nure_eg['gy'].abs()
nure_eg['gz'] = nure_eg['gz'].abs()


kohei_egxy = pd.merge(kohei_egx, kohei_egy)
kohei_eg = pd.merge(kohei_egxy, kohei_egz)
kohei_eg['gx'] = kohei_eg['gx'].abs()
kohei_eg['gy'] = kohei_eg['gy'].abs()
kohei_eg['gz'] = kohei_eg['gz'].abs()

yamana_egxy = pd.merge(yamana_egx, yamana_egy)
yamana_eg = pd.merge(yamana_egxy, yamana_egz)
yamana_eg['gx'] = yamana_eg['gx'].abs()
yamana_eg['gy'] = yamana_eg['gy'].abs()
yamana_eg['gz'] = yamana_eg['gz'].abs()

toshi_egxy = pd.merge(toshi_egx, toshi_egy)
toshi_eg = pd.merge(toshi_egxy, toshi_egz)
toshi_eg['gx'] = toshi_eg['gx'].abs()
toshi_eg['gy'] = toshi_eg['gy'].abs()
toshi_eg['gz'] = toshi_eg['gz'].abs()

yukita_egxy = pd.merge(yukita_egx, yukita_egy)
yukita_eg = pd.merge(yukita_egxy, yukita_egz)
yukita_eg['gx'] = yukita_eg['gx'].abs()
yukita_eg['gy'] = yukita_eg['gy'].abs()
yukita_eg['gz'] = yukita_eg['gz'].abs()

miyuki_egxy = pd.merge(miyuki_egx, miyuki_egy)
miyuki_eg = pd.merge(miyuki_egxy, miyuki_egz)
miyuki_eg['gx'] = miyuki_eg['gx'].abs()
miyuki_eg['gy'] = miyuki_eg['gy'].abs()
miyuki_eg['gz'] = miyuki_eg['gz'].abs()

Normalizing accelerometer data


In [5]:
dohi_ja['ax'] = (dohi_ja['ax'] - 498).abs()
dohi_ja['ay'] = (dohi_ja['ay'] - 536).abs()
dohi_ja['az'] = (dohi_ja['az'] - 715).abs()

dohi_ea['ax'] = (dohi_ea['ax'] - 498).abs()
dohi_ea['ay'] = (dohi_ea['ay'] - 536).abs()
dohi_ea['az'] = (dohi_ea['az'] - 715).abs()

nure_ja['ax'] = (nure_ja['ax'] - 498).abs()
nure_ja['ay'] = (nure_ja['ay'] - 536).abs()
nure_ja['az'] = (nure_ja['az'] - 715).abs()

nure_ea['ax'] = (nure_ea['ax'] - 498).abs()
nure_ea['ay'] = (nure_ea['ay'] - 536).abs()
nure_ea['az'] = (nure_ea['az'] - 715).abs()

kohei_ja['ax'] = (kohei_ja['ax'] - 498).abs()
kohei_ja['ay'] = (kohei_ja['ay'] - 536).abs()
kohei_ja['az'] = (kohei_ja['az'] - 715).abs()

kohei_ea['ax'] = (kohei_ea['ax'] - 498).abs()
kohei_ea['ay'] = (kohei_ea['ay'] - 536).abs()
kohei_ea['az'] = (kohei_ea['az'] - 715).abs()

yamana_ja['ax'] = (yamana_ja['ax'] - 498).abs()
yamana_ja['ay'] = (yamana_ja['ay'] - 536).abs()
yamana_ja['az'] = (yamana_ja['az'] - 715).abs()

yamana_ea['ax'] = (yamana_ea['ax'] - 498).abs()
yamana_ea['ay'] = (yamana_ea['ay'] - 536).abs()
yamana_ea['az'] = (yamana_ea['az'] - 715).abs()

toshi_ja['ax'] = (toshi_ja['ax'] - 498).abs()
toshi_ja['ay'] = (toshi_ja['ay'] - 536).abs()
toshi_ja['az'] = (toshi_ja['az'] - 715).abs()

toshi_ea['ax'] = (toshi_ea['ax'] - 498).abs()
toshi_ea['ay'] = (toshi_ea['ay'] - 536).abs()
toshi_ea['az'] = (toshi_ea['az'] - 715).abs()

yukita_ja['ax'] = (yukita_ja['ax'] - 498).abs()
yukita_ja['ay'] = (yukita_ja['ay'] - 536).abs()
yukita_ja['az'] = (yukita_ja['az'] - 715).abs()

yukita_ea['ax'] = (yukita_ea['ax'] - 498).abs()
yukita_ea['ay'] = (yukita_ea['ay'] - 536).abs()
yukita_ea['az'] = (yukita_ea['az'] - 715).abs()

miyuki_ja['ax'] = (miyuki_ja['ax'] - 498).abs()
miyuki_ja['ay'] = (miyuki_ja['ay'] - 536).abs()
miyuki_ja['az'] = (miyuki_ja['az'] - 715).abs()

miyuki_ea['ax'] = (miyuki_ea['ax'] - 498).abs()
miyuki_ea['ay'] = (miyuki_ea['ay'] - 536).abs()
miyuki_ea['az'] = (miyuki_ea['az'] - 715).abs()

Data Exploration


In [6]:
dohi_ja.describe()


Out[6]:
ax ay az
count 8344.000000 8344.000000 8344.000000
mean 13.938998 11.802493 2.330537
std 8.394803 6.008330 4.177105
min 0.000000 0.000000 0.000000
25% 7.000000 9.000000 1.000000
50% 13.000000 11.000000 1.000000
75% 20.000000 13.000000 3.000000
max 112.000000 77.000000 85.000000

In [7]:
dohi_ea.describe()


Out[7]:
ax ay az
count 8565.000000 8565.000000 8565.000000
mean 46.275073 46.834442 26.834209
std 43.955436 63.745891 59.135428
min 0.000000 0.000000 0.000000
25% 14.000000 18.000000 2.000000
50% 44.000000 24.000000 4.000000
75% 57.000000 37.000000 7.000000
max 512.000000 407.000000 340.000000

In [8]:
dohi_jg.describe()


Out[8]:
gx gy gz
count 2099.000000 2099.000000 2099.000000
mean 0.154031 0.081898 0.092187
std 0.172759 0.105041 0.104546
min 0.000036 0.000021 0.000267
25% 0.059843 0.024141 0.037604
50% 0.101817 0.050890 0.064830
75% 0.189783 0.101126 0.108477
max 2.238537 1.271654 1.660091

In [9]:
dohi_eg.describe()


Out[9]:
gx gy gz
count 2128.000000 2128.000000 2128.000000
mean 0.188360 0.124769 0.151924
std 0.212338 0.172693 0.180918
min 0.000482 0.000047 0.000107
25% 0.065439 0.028480 0.044793
50% 0.119615 0.061904 0.089711
75% 0.231355 0.154729 0.183847
max 2.441734 2.114181 1.513094

In [10]:
nure_ja.describe()


Out[10]:
ax ay az
count 7816.000000 7816.000000 7816.000000
mean 155.710466 48.758572 77.591351
std 51.751488 55.780983 48.769074
min 1.000000 0.000000 0.000000
25% 137.000000 11.000000 49.000000
50% 163.000000 23.000000 68.000000
75% 185.000000 66.000000 92.000000
max 485.000000 450.000000 548.000000

In [11]:
nure_ea.describe()


Out[11]:
ax ay az
count 8523.000000 8523.000000 8523.000000
mean 129.824944 115.888889 103.460049
std 74.338097 69.942985 79.385116
min 0.000000 0.000000 0.000000
25% 71.000000 60.000000 43.000000
50% 130.000000 107.000000 77.000000
75% 180.000000 163.000000 156.000000
max 487.000000 410.000000 470.000000

In [12]:
nure_jg.describe()


Out[12]:
gx gy gz
count 1944.000000 1944.000000 1944.000000
mean 0.173380 0.142334 0.135241
std 0.196616 0.201231 0.172629
min 0.000016 0.000105 0.000280
25% 0.060659 0.030855 0.039338
50% 0.107299 0.073584 0.080548
75% 0.222222 0.169897 0.172262
max 2.219798 1.824189 2.940897

In [13]:
nure_eg.describe()


Out[13]:
gx gy gz
count 2058.000000 2058.000000 2058.000000
mean 0.171023 0.163446 0.164247
std 0.187886 0.218380 0.228258
min 0.000039 0.000039 0.000087
25% 0.053717 0.033014 0.045948
50% 0.110274 0.089567 0.099012
75% 0.216160 0.210070 0.199706
max 1.979083 2.193491 4.554734

In [14]:
kohei_ja.describe()


Out[14]:
ax ay az
count 8245.000000 8245.000000 8245.000000
mean 109.162159 131.708308 62.179018
std 38.336416 50.558417 37.749278
min 7.000000 21.000000 0.000000
25% 69.000000 81.000000 20.000000
50% 126.000000 155.000000 85.000000
75% 138.000000 177.000000 95.000000
max 223.000000 238.000000 129.000000

In [15]:
kohei_ea.describe()


Out[15]:
ax ay az
count 8122.000000 8122.000000 8122.000000
mean 66.053066 64.964418 11.130017
std 7.001611 4.346986 2.827920
min 33.000000 49.000000 0.000000
25% 63.000000 62.000000 10.000000
50% 66.000000 65.000000 11.000000
75% 70.000000 68.000000 12.000000
max 129.000000 95.000000 53.000000

In [16]:
kohei_jg.describe()


Out[16]:
gx gy gz
count 2030.000000 2030.000000 2030.000000
mean 0.143999 0.062906 0.071715
std 0.149712 0.078820 0.070707
min 0.000007 0.000034 0.000056
25% 0.053864 0.017991 0.030575
50% 0.094984 0.037919 0.053305
75% 0.179413 0.074748 0.091097
max 1.628062 0.727467 0.928339

In [17]:
kohei_eg.describe()


Out[17]:
gx gy gz
count 1954.000000 1954.000000 1954.000000
mean 0.209931 0.085349 0.099926
std 0.240317 0.096477 0.101399
min 0.000121 0.000039 0.000060
25% 0.057409 0.023273 0.034495
50% 0.127588 0.049167 0.067521
75% 0.266977 0.114234 0.130506
max 2.019367 0.846836 0.834234

In [18]:
yamana_ja.describe()


Out[18]:
ax ay az
count 8445.000000 8445.000000 8445.000000
mean 128.138662 162.676495 221.101125
std 61.898540 66.109682 107.613468
min 0.000000 0.000000 0.000000
25% 84.000000 132.000000 136.000000
50% 139.000000 175.000000 201.000000
75% 163.000000 210.000000 322.000000
max 521.000000 454.000000 667.000000

In [19]:
yamana_ea.describe()


Out[19]:
ax ay az
count 7961.000000 7961.000000 7961.000000
mean 127.505464 121.296948 254.832936
std 69.350102 56.089896 129.347507
min 0.000000 0.000000 1.000000
25% 75.000000 88.000000 150.000000
50% 110.000000 124.000000 245.000000
75% 185.000000 157.000000 370.000000
max 522.000000 410.000000 710.000000

In [20]:
yamana_jg.describe()


Out[20]:
gx gy gz
count 2042.000000 2042.000000 2042.000000
mean 0.179675 0.272659 0.206725
std 0.199204 0.502072 0.336218
min 0.000009 0.000009 0.000181
25% 0.063892 0.061167 0.053589
50% 0.119215 0.144742 0.123021
75% 0.220286 0.300762 0.241688
max 2.150331 6.171508 3.775052

In [21]:
yamana_eg.describe()


Out[21]:
gx gy gz
count 1915.000000 1915.000000 1915.000000
mean 0.133759 0.175646 0.167123
std 0.121238 0.201914 0.176686
min 0.000376 0.000123 0.000272
25% 0.051869 0.039201 0.051947
50% 0.099535 0.111445 0.105453
75% 0.175881 0.231527 0.219055
max 1.069654 1.385312 1.161772

In [22]:
toshi_ja.describe()


Out[22]:
ax ay az
count 2600.000000 2600.000000 2600.000000
mean 220.798077 64.755385 168.674615
std 30.398349 32.498216 53.289140
min 4.000000 0.000000 2.000000
25% 210.000000 40.000000 125.000000
50% 230.000000 62.000000 177.500000
75% 238.000000 85.000000 201.000000
max 327.000000 262.000000 424.000000

In [23]:
toshi_ea.describe()


Out[23]:
ax ay az
count 2526.000000 2526.000000 2526.000000
mean 218.230008 21.197941 249.893903
std 31.160075 24.534308 62.577842
min 34.000000 0.000000 3.000000
25% 210.000000 5.000000 218.000000
50% 222.000000 15.000000 265.000000
75% 237.000000 30.000000 290.000000
max 470.000000 217.000000 425.000000

In [24]:
toshi_jg.describe()


Out[24]:
gx gy gz
count 2023.000000 2023.000000 2023.000000
mean 0.147586 0.166275 0.152512
std 0.152061 0.280774 0.212885
min 0.000011 0.000007 0.000123
25% 0.051827 0.039950 0.041538
50% 0.094581 0.074615 0.080912
75% 0.188155 0.179466 0.174754
max 1.441948 3.438870 1.907151

In [25]:
toshi_eg.describe()


Out[25]:
gx gy gz
count 2033.000000 2033.000000 2033.000000
mean 0.139347 0.134345 0.129093
std 0.153419 0.152256 0.166511
min 0.000058 0.000058 0.000096
25% 0.049592 0.044059 0.038021
50% 0.093072 0.078231 0.076194
75% 0.173285 0.167170 0.153738
max 1.485249 1.506839 1.594890

In [26]:
yukita_ja.describe()


Out[26]:
ax ay az
count 2527.000000 2527.000000 2527.000000
mean 197.931539 136.108825 193.755046
std 73.564917 44.527550 63.253181
min 0.000000 0.000000 0.000000
25% 184.000000 116.000000 161.000000
50% 206.000000 132.000000 201.000000
75% 221.000000 155.000000 226.000000
max 522.000000 424.000000 601.000000

In [27]:
yukita_ea.describe()


Out[27]:
ax ay az
count 2662.000000 2662.000000 2662.000000
mean 204.919985 101.090158 193.232532
std 70.292367 46.471152 76.735462
min 0.000000 0.000000 1.000000
25% 191.000000 66.000000 153.250000
50% 215.000000 105.500000 193.000000
75% 229.000000 128.000000 228.000000
max 498.000000 460.000000 536.000000

In [28]:
yukita_jg.describe()


Out[28]:
gx gy gz
count 2016.000000 2016.000000 2016.000000
mean 0.299437 0.189737 0.210491
std 0.270873 0.199999 0.207286
min 0.000006 0.000042 0.000202
25% 0.101473 0.063817 0.069097
50% 0.219025 0.140953 0.148114
75% 0.420008 0.238225 0.287407
max 1.994394 1.732510 2.162964

In [29]:
yukita_eg.describe()


Out[29]:
gx gy gz
count 2125.000000 2125.000000 2125.000000
mean 0.300724 0.241422 0.204964
std 0.329862 0.317781 0.252956
min 0.000019 0.000020 0.000148
25% 0.085502 0.063239 0.053328
50% 0.189736 0.139580 0.127103
75% 0.405310 0.278125 0.253825
max 2.701567 2.804856 3.112612

Visualizing the data

In visualizing the data, I chose three plotting methods.

  1. simple line plot to see the change of gesture on a time scale
  2. histogram to see the frequency of occurences of specific gesture sizes
  3. bar chart to compare the standard deviation of the sensor values in in Japanese and English

Since the standard deviation of each subject's sensor data would show the richness of their gestures in each language, I believe that the comparison of standard deviation would lead to the confirmation of my hypotheses.

SUBJECT 1 (English level: LOW)


In [30]:
dja = dohi_ja.plot(title="Wrist Accelerometer in Japanese")
dja.set_ylabel("Accelerometer value")
dja.set_xlabel("Data Index")
dea = dohi_ea.plot(title="Wrist Accelerometer in English")
dea.set_ylabel("Accelerometer value")
dea.set_xlabel("Data Index")


Out[30]:
<matplotlib.text.Text at 0xd88cd30>

In [31]:
dohi_ja.hist(bins=30, figsize=(10,10), sharey = True)
dohi_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[31]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000000E6BA2E8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E7B7B00>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000000E827D30>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E9315F8>]], dtype=object)

In [32]:
djg = dohi_jg.plot(title="Head Gyroscope in Japanese")
djg.set_ylabel("Gyroscope value")
djg.set_xlabel("Data Index")
deg = dohi_eg.plot(title="Head Gyroscope in English")
deg.set_ylabel("Gyroscope value")
deg.set_xlabel("Data Index")


Out[32]:
<matplotlib.text.Text at 0xe68c860>

In [33]:
dohi_jg.hist(bins=30, figsize=(10,10), sharey = True)
dohi_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[33]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000000F872320>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000000E3ECF60>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000000FDB4EB8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001076C908>]], dtype=object)

In [34]:
hold(True)
#dohi_ja.std().plot(kind="bar")
#dohi_ea.std().plot(kind="bar")
plt.plot(dohi_ja.std(),label='Japanese')
plt.plot(dohi_ea.std(),label='English')
plt.legend(loc='upper right')


Out[34]:
<matplotlib.legend.Legend at 0x10f06390>

In [35]:
hold(True)
plt.plot(dohi_jg.std(),label='Japanese')
plt.plot(dohi_eg.std(),label='English')
plt.legend(loc='upper right')


Out[35]:
<matplotlib.legend.Legend at 0x1077bf60>

SUBJECT 2 (English Level: LOW)


In [36]:
kja = kohei_ja.plot(title="Wrist Accelerometer in Japanese")
kja.set_ylabel("Accelerometer value")
kja.set_xlabel("Data Index")
kea = kohei_ea.plot(title="Wrist Accelerometer in English")
kea.set_ylabel("Accelerometer value")
kea.set_xlabel("Data Index")


Out[36]:
<matplotlib.text.Text at 0x10cd4f60>

In [37]:
kohei_ja.hist(bins=30, figsize=(10,10), sharey = True)
kohei_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[37]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000000011C4CF60>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000011D13B38>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000011DF2EB8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000011F079E8>]], dtype=object)

In [38]:
kjg = kohei_jg.plot(title="Head Gyroscope in Japanese")
kjg.set_ylabel("Gyroscope value")
kjg.set_xlabel("Data Index")
keg = kohei_eg.plot(title="Head Gyroscope in English")
keg.set_ylabel("Gyroscope value")
keg.set_xlabel("Data Index")


Out[38]:
<matplotlib.text.Text at 0x12b494e0>

In [39]:
kohei_jg.hist(bins=30, figsize=(10,10), sharey = True)
kohei_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[39]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000000137BDC50>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000138C68D0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000000139D0828>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000013A57278>]], dtype=object)

In [40]:
hold(True)
plt.plot(kohei_ja.std(),label='Japanese')
plt.plot(kohei_ea.std(),label='English')
plt.legend(loc='upper right')


Out[40]:
<matplotlib.legend.Legend at 0x127777f0>

In [41]:
hold(True)
plt.plot(kohei_jg.std(),label='Japanese')
plt.plot(kohei_eg.std(),label='English')
plt.legend(loc='upper right')


Out[41]:
<matplotlib.legend.Legend at 0x11d23cf8>

SUBJECT 3 (English Level: INTERMEDIATE)


In [42]:
nja = nure_ja.plot(title="Wrist Accelerometer in Japanese")
nja.set_ylabel("Accelerometer value")
nja.set_xlabel("Data Index")
nea = nure_ea.plot(title="Wrist Accelerometer in English")
nea.set_ylabel("Accelerometer value")
nea.set_xlabel("Data Index")


Out[42]:
<matplotlib.text.Text at 0xe302518>

In [43]:
nure_ja.hist(bins=30, figsize=(10,10), sharey = True)
nure_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[43]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000001528A438>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000015388FD0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001547F390>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000015539E80>]], dtype=object)

In [44]:
njg = nure_jg.plot(title="Head Gyroscope in Japanese")
njg.set_ylabel("Gyroscope value")
njg.set_xlabel("Data Index")
neg = nure_eg.plot(title="Head Gyroscope in English")
neg.set_ylabel("Gyroscope value")
neg.set_xlabel("Data Index")


Out[44]:
<matplotlib.text.Text at 0x12ccc208>

In [45]:
nure_jg.hist(bins=30, figsize=(10,10), sharey = True)
nure_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[45]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000001643B1D0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000016534E10>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000000165E6D68>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000166B17B8>]], dtype=object)

In [46]:
hold(True)
plt.plot(nure_ja.std(),label='Japanese')
plt.plot(nure_ea.std(),label='English')
plt.legend(loc='upper right')


Out[46]:
<matplotlib.legend.Legend at 0x150e67b8>

In [47]:
hold(True)
plt.plot(nure_jg.std(),label='Japanese')
plt.plot(nure_eg.std(),label='English')
plt.legend(loc='upper right')


Out[47]:
<matplotlib.legend.Legend at 0x145ba400>

SUBJECT 4 (English Level: INTERMEDIATE)


In [48]:
tja = toshi_ja.plot(title="Wrist Accelerometer in Japanese")
tja.set_ylabel("Accelerometer value")
tja.set_xlabel("Data Index")
tea = toshi_ea.plot(title="Wrist Accelerometer in English")
tea.set_ylabel("Accelerometer value")
tea.set_xlabel("Data Index")


Out[48]:
<matplotlib.text.Text at 0x16c359b0>

In [49]:
toshi_ja.hist(bins=30, figsize=(10,10), sharey = True)
toshi_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[49]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000000015EF14A8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001615BF98>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x00000000161CD160>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000143CECC0>]], dtype=object)

In [50]:
tjg = toshi_jg.plot(title="Head Gyroscope in Japanese")
tjg.set_ylabel("Gyroscope value")
tjg.set_xlabel("Data Index")
teg = toshi_eg.plot(title="Head Gyroscope in English")
teg.set_ylabel("Gyroscope value")
teg.set_xlabel("Data Index")


Out[50]:
<matplotlib.text.Text at 0x17e8e390>

In [51]:
toshi_jg.hist(bins=30, figsize=(10,10), sharey = True)
toshi_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[51]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000001467AAC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000015175CF8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000000017951C88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x00000000177A2160>]], dtype=object)

In [52]:
hold(True)
plt.plot(toshi_ja.std(),label='Japanese')
plt.plot(toshi_ea.std(),label='English')
plt.legend(loc='upper right')


Out[52]:
<matplotlib.legend.Legend at 0x18eb5e80>

In [53]:
hold(True)
plt.plot(toshi_jg.std(),label='Japanese')
plt.plot(toshi_eg.std(),label='English')
plt.legend(loc='upper right')


Out[53]:
<matplotlib.legend.Legend at 0x190954a8>

SUBJECT 5 (English Level: HIGH)


In [54]:
yuja = yukita_ja.plot(title="Wrist Accelerometer in Japanese")
yuja.set_ylabel("Accelerometer value")
yuja.set_xlabel("Data Index")
yuea = yukita_ea.plot(title="Wrist Accelerometer in English")
yuea.set_ylabel("Accelerometer value")
yuea.set_xlabel("Data Index")


Out[54]:
<matplotlib.text.Text at 0x19553588>

In [55]:
yukita_ja.hist(bins=30, figsize=(10,10), sharey = True)
yukita_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[55]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000000017809588>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000018875CC0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001467A5F8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000000018BCF438>]], dtype=object)

In [56]:
yujg = yukita_jg.plot(title="Head Gyroscope in Japanese")
yujg.set_ylabel("Accelerometer value")
yujg.set_xlabel("Data Index")
yueg = yukita_eg.plot(title="Head Gyroscope in English")
yueg.set_ylabel("Accelerometer value")
yueg.set_xlabel("Data Index")


Out[56]:
<matplotlib.text.Text at 0x1a92ac18>

In [57]:
yukita_jg.hist(bins=30, figsize=(10,10), sharey = True)
yukita_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[57]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000001B903DD8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001BA0EA58>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001BB149B0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001BB9B400>]], dtype=object)

In [58]:
hold(True)
plt.plot(yukita_ja.std(),label='Japanese')
plt.plot(yukita_ea.std(),label='English')
plt.legend(loc='upper right')


Out[58]:
<matplotlib.legend.Legend at 0x1bd22f98>

In [59]:
hold(True)
plt.plot(yukita_jg.std(),label='Japanese')
plt.plot(yukita_eg.std(),label='English')
plt.legend(loc='upper right')


Out[59]:
<matplotlib.legend.Legend at 0x1bf1ab00>

SUBJECT 6 (English Level: HIGH)


In [60]:
yja = yamana_ja.plot(title="Wrist Accelerometer in Japanese")
yja.set_ylabel("Accelerometer value")
yja.set_xlabel("Data Index")
yea = yamana_ea.plot(title="Wrist Accelerometer in English")
yea.set_ylabel("Accelerometer value")
yea.set_xlabel("Data Index")


Out[60]:
<matplotlib.text.Text at 0x1c168438>

In [61]:
yamana_ja.hist(bins=30, figsize=(10,10), sharey = True)
yamana_ea.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[61]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000000001F901C88>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001F9CA860>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001FA66BE0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001FB39710>]], dtype=object)

In [62]:
yjg = yamana_jg.plot(title="Head Gyroscope in Japanese")
yjg.set_ylabel("Accelerometer value")
yjg.set_xlabel("Data Index")
yeg = yamana_eg.plot(title="Head Gyroscope in English")
yeg.set_ylabel("Accelerometer value")
yeg.set_xlabel("Data Index")


Out[62]:
<matplotlib.text.Text at 0x2071f390>

In [63]:
yamana_jg.hist(bins=30, figsize=(10,10), sharey = True)
yamana_eg.hist(bins=30, facecolor='green', figsize=(10,10), sharey = True)


Out[63]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x00000000200B7EB8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001854D7F0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000000001C13ABE0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000000001B6CFEF0>]], dtype=object)

In [64]:
hold(True)
plt.plot(yamana_ja.std(),label='Japanese')
plt.plot(yamana_ea.std(),label='English')
plt.legend(loc='upper right')


Out[64]:
<matplotlib.legend.Legend at 0x19c0d358>

In [65]:
hold(True)
plt.plot(yamana_jg.std(),label='Japanese')
plt.plot(yamana_eg.std(),label='English')
plt.legend(loc='upper right')


Out[65]:
<matplotlib.legend.Legend at 0x1f4ae3c8>

Creating a vector for each subject containing the value of all the histograms

In order to create a classifier, I created a vector for each subject. The vector contains the counts of 30 bins for the subject's histogram of accelerator and gyroscope. Since each subject has 12 histograms each (ax, ay, az in Japanese and English & gx, gy, gz in Japanese and Engish), each subject's vector has 12x30 = 360 dimensions. The range of the division for each historgram was decided by the maximum and the minium value of every participant's ax, ay, az, gx, gy and gz.


In [66]:
import numpy as np
dohi_jax_count,dohi_jax_division = np.histogram(dohi_ja['ax'], bins=30, range=(0,522))
dohi_jay_count,dohi_jay_division = np.histogram(dohi_ja['ay'], bins=30, range=(0,460))
dohi_jaz_count,dohi_jaz_division = np.histogram(dohi_ja['az'], bins=30, range=(0,710))

dohi_eax_count,dohi_eax_division = np.histogram(dohi_ea['ax'], bins=30, range=(0,522))
dohi_eay_count,dohi_eay_division = np.histogram(dohi_ea['ay'], bins=30, range=(0,460))
dohi_eaz_count,dohi_eaz_division = np.histogram(dohi_ea['az'], bins=30, range=(0,710))

dohi_jgx_count,dohi_jgx_division = np.histogram(dohi_jg['gx'], bins=30, range=(0,2.71))
dohi_jgy_count,dohi_jgy_division = np.histogram(dohi_jg['gy'], bins=30, range=(0,6.18))
dohi_jgz_count,dohi_jgz_division = np.histogram(dohi_jg['gz'], bins=30, range=(0,4.56))

dohi_egx_count,dohi_egx_division = np.histogram(dohi_eg['gx'], bins=30, range=(0,2.71))
dohi_egy_count,dohi_egy_division = np.histogram(dohi_eg['gy'], bins=30, range=(0,6.18))
dohi_egz_count,dohi_egz_division = np.histogram(dohi_eg['gz'], bins=30, range=(0,4.56))

dohi_combined = (dohi_jax_count.tolist() + dohi_jay_count.tolist() + dohi_jaz_count.tolist() + dohi_eax_count.tolist() + dohi_eay_count.tolist() + dohi_eaz_count.tolist() + dohi_jgx_count.tolist() + dohi_jgy_count.tolist() + dohi_jgz_count.tolist() + dohi_egx_count.tolist() + dohi_egy_count.tolist() + dohi_egz_count.tolist())

kohei_jax_count,kohei_jax_division = np.histogram(kohei_ja['ax'], bins=30, range=(0,522))
kohei_jay_count,kohei_jay_division = np.histogram(kohei_ja['ay'], bins=30, range=(0,460))
kohei_jaz_count,kohei_jaz_division = np.histogram(kohei_ja['az'], bins=30, range=(0,710))

kohei_eax_count,kohei_eax_division = np.histogram(kohei_ea['ax'], bins=30, range=(0,522))
kohei_eay_count,kohei_eay_division = np.histogram(kohei_ea['ay'], bins=30, range=(0,460))
kohei_eaz_count,kohei_eaz_division = np.histogram(kohei_ea['az'], bins=30, range=(0,710))

kohei_jgx_count,kohei_jgx_division = np.histogram(kohei_jg['gx'], bins=30, range=(0,2.71))
kohei_jgy_count,kohei_jgy_division = np.histogram(kohei_jg['gy'], bins=30, range=(0,6.18))
kohei_jgz_count,kohei_jgz_division = np.histogram(kohei_jg['gz'], bins=30, range=(0,4.56))

kohei_egx_count,kohei_egx_division = np.histogram(kohei_eg['gx'], bins=30, range=(0,2.71))
kohei_egy_count,kohei_egy_division = np.histogram(kohei_eg['gy'], bins=30, range=(0,6.18))
kohei_egz_count,kohei_egz_division = np.histogram(kohei_eg['gz'], bins=30, range=(0,4.56))

kohei_combined = (kohei_jax_count.tolist() + kohei_jay_count.tolist() + kohei_jaz_count.tolist() + kohei_eax_count.tolist() + kohei_eay_count.tolist() + kohei_eaz_count.tolist() + kohei_jgx_count.tolist() + kohei_jgy_count.tolist() + kohei_jgz_count.tolist() + kohei_egx_count.tolist() + kohei_egy_count.tolist() + kohei_egz_count.tolist())

nure_jax_count,nure_jax_division = np.histogram(nure_ja['ax'], bins=30, range=(0,522))
nure_jay_count,nure_jay_division = np.histogram(nure_ja['ay'], bins=30, range=(0,460))
nure_jaz_count,nure_jaz_division = np.histogram(nure_ja['az'], bins=30, range=(0,710))

nure_eax_count,nure_eax_division = np.histogram(nure_ea['ax'], bins=30, range=(0,522))
nure_eay_count,nure_eay_division = np.histogram(nure_ea['ay'], bins=30, range=(0,460))
nure_eaz_count,nure_eaz_division = np.histogram(nure_ea['az'], bins=30, range=(0,710))

nure_jgx_count,nure_jgx_division = np.histogram(nure_jg['gx'], bins=30, range=(0,2.71))
nure_jgy_count,nure_jgy_division = np.histogram(nure_jg['gy'], bins=30, range=(0,6.18))
nure_jgz_count,nure_jgz_division = np.histogram(nure_jg['gz'], bins=30, range=(0,4.56))

nure_egx_count,nure_egx_division = np.histogram(nure_eg['gx'], bins=30, range=(0,2.71))
nure_egy_count,nure_egy_division = np.histogram(nure_eg['gy'], bins=30, range=(0,6.18))
nure_egz_count,nure_egz_division = np.histogram(nure_eg['gz'], bins=30, range=(0,4.56))

nure_combined = (nure_jax_count.tolist() + nure_jay_count.tolist() + nure_jaz_count.tolist() + nure_eax_count.tolist() + nure_eay_count.tolist() + nure_eaz_count.tolist() + nure_jgx_count.tolist() + nure_jgy_count.tolist() + nure_jgz_count.tolist() + nure_egx_count.tolist() + nure_egy_count.tolist() + nure_egz_count.tolist())

toshi_jax_count,toshi_jax_division = np.histogram(toshi_ja['ax'], bins=30, range=(0,522))
toshi_jay_count,toshi_jay_division = np.histogram(toshi_ja['ay'], bins=30, range=(0,460))
toshi_jaz_count,toshi_jaz_division = np.histogram(toshi_ja['az'], bins=30, range=(0,710))

toshi_eax_count,toshi_eax_division = np.histogram(toshi_ea['ax'], bins=30, range=(0,522))
toshi_eay_count,toshi_eay_division = np.histogram(toshi_ea['ay'], bins=30, range=(0,460))
toshi_eaz_count,toshi_eaz_division = np.histogram(toshi_ea['az'], bins=30, range=(0,710))

toshi_jgx_count,toshi_jgx_division = np.histogram(toshi_jg['gx'], bins=30, range=(0,2.71))
toshi_jgy_count,toshi_jgy_division = np.histogram(toshi_jg['gy'], bins=30, range=(0,6.18))
toshi_jgz_count,toshi_jgz_division = np.histogram(toshi_jg['gz'], bins=30, range=(0,4.56))

toshi_egx_count,toshi_egx_division = np.histogram(toshi_eg['gx'], bins=30, range=(0,2.71))
toshi_egy_count,toshi_egy_division = np.histogram(toshi_eg['gy'], bins=30, range=(0,6.18))
toshi_egz_count,toshi_egz_division = np.histogram(toshi_eg['gz'], bins=30, range=(0,4.56))

toshi_combined = (toshi_jax_count.tolist() + toshi_jay_count.tolist() + toshi_jaz_count.tolist() + toshi_eax_count.tolist() + toshi_eay_count.tolist() + toshi_eaz_count.tolist() + toshi_jgx_count.tolist() + toshi_jgy_count.tolist() + toshi_jgz_count.tolist() + toshi_egx_count.tolist() + toshi_egy_count.tolist() + toshi_egz_count.tolist())

yukita_jax_count,yukita_jax_division = np.histogram(yukita_ja['ax'], bins=30, range=(0,522))
yukita_jay_count,yukita_jay_division = np.histogram(yukita_ja['ay'], bins=30, range=(0,460))
yukita_jaz_count,yukita_jaz_division = np.histogram(yukita_ja['az'], bins=30, range=(0,710))

yukita_eax_count,yukita_eax_division = np.histogram(yukita_ea['ax'], bins=30, range=(0,522))
yukita_eay_count,yukita_eay_division = np.histogram(yukita_ea['ay'], bins=30, range=(0,460))
yukita_eaz_count,yukita_eaz_division = np.histogram(yukita_ea['az'], bins=30, range=(0,710))

yukita_jgx_count,yukita_jgx_division = np.histogram(yukita_jg['gx'], bins=30, range=(0,2.71))
yukita_jgy_count,yukita_jgy_division = np.histogram(yukita_jg['gy'], bins=30, range=(0,6.18))
yukita_jgz_count,yukita_jgz_division = np.histogram(yukita_jg['gz'], bins=30, range=(0,4.56))

yukita_egx_count,yukita_egx_division = np.histogram(yukita_eg['gx'], bins=30, range=(0,2.71))
yukita_egy_count,yukita_egy_division = np.histogram(yukita_eg['gy'], bins=30, range=(0,6.18))
yukita_egz_count,yukita_egz_division = np.histogram(yukita_eg['gz'], bins=30, range=(0,4.56))

yukita_combined = (yukita_jax_count.tolist() + yukita_jay_count.tolist() + yukita_jaz_count.tolist() + yukita_eax_count.tolist() + yukita_eay_count.tolist() + yukita_eaz_count.tolist() + yukita_jgx_count.tolist() + yukita_jgy_count.tolist() + yukita_jgz_count.tolist() + yukita_egx_count.tolist() + yukita_egy_count.tolist() + yukita_egz_count.tolist())

yamana_jax_count,yamana_jax_division = np.histogram(yamana_ja['ax'], bins=30, range=(0,522))
yamana_jay_count,yamana_jay_division = np.histogram(yamana_ja['ay'], bins=30, range=(0,460))
yamana_jaz_count,yamana_jaz_division = np.histogram(yamana_ja['az'], bins=30, range=(0,710))

yamana_eax_count,yamana_eax_division = np.histogram(yamana_ea['ax'], bins=30, range=(0,522))
yamana_eay_count,yamana_eay_division = np.histogram(yamana_ea['ay'], bins=30, range=(0,460))
yamana_eaz_count,yamana_eaz_division = np.histogram(yamana_ea['az'], bins=30, range=(0,710))

yamana_jgx_count,yamana_jgx_division = np.histogram(yamana_jg['gx'], bins=30, range=(0,2.71))
yamana_jgy_count,yamana_jgy_division = np.histogram(yamana_jg['gy'], bins=30, range=(0,6.18))
yamana_jgz_count,yamana_jgz_division = np.histogram(yamana_jg['gz'], bins=30, range=(0,4.56))

yamana_egx_count,yamana_egx_division = np.histogram(yamana_eg['gx'], bins=30, range=(0,2.71))
yamana_egy_count,yamana_egy_division = np.histogram(yamana_eg['gy'], bins=30, range=(0,6.18))
yamana_egz_count,yamana_egz_division = np.histogram(yamana_eg['gz'], bins=30, range=(0,4.56))

yamana_combined = (yamana_jax_count.tolist() + yamana_jay_count.tolist() + yamana_jaz_count.tolist() + yamana_eax_count.tolist() + yamana_eay_count.tolist() + yamana_eaz_count.tolist() + yamana_jgx_count.tolist() + yamana_jgy_count.tolist() + yamana_jgz_count.tolist() + yamana_egx_count.tolist() + yamana_egy_count.tolist() + yamana_egz_count.tolist())

miyuki_jax_count,miyuki_jax_division = np.histogram(miyuki_ja['ax'], bins=30, range=(0,522))
miyuki_jay_count,miyuki_jay_division = np.histogram(miyuki_ja['ay'], bins=30, range=(0,460))
miyuki_jaz_count,miyuki_jaz_division = np.histogram(miyuki_ja['az'], bins=30, range=(0,710))

miyuki_eax_count,miyuki_eax_division = np.histogram(miyuki_ea['ax'], bins=30, range=(0,522))
miyuki_eay_count,miyuki_eay_division = np.histogram(miyuki_ea['ay'], bins=30, range=(0,460))
miyuki_eaz_count,miyuki_eaz_division = np.histogram(miyuki_ea['az'], bins=30, range=(0,710))

miyuki_jgx_count,miyuki_jgx_division = np.histogram(miyuki_jg['gx'], bins=30, range=(0,2.71))
miyuki_jgy_count,miyuki_jgy_division = np.histogram(miyuki_jg['gy'], bins=30, range=(0,6.18))
miyuki_jgz_count,miyuki_jgz_division = np.histogram(miyuki_jg['gz'], bins=30, range=(0,4.56))

miyuki_egx_count,miyuki_egx_division = np.histogram(miyuki_eg['gx'], bins=30, range=(0,2.71))
miyuki_egy_count,miyuki_egy_division = np.histogram(miyuki_eg['gy'], bins=30, range=(0,6.18))
miyuki_egz_count,miyuki_egz_division = np.histogram(miyuki_eg['gz'], bins=30, range=(0,4.56))

miyuki_combined = (miyuki_jax_count.tolist() + miyuki_jay_count.tolist() + miyuki_jaz_count.tolist() + miyuki_eax_count.tolist() + miyuki_eay_count.tolist() + miyuki_eaz_count.tolist() + miyuki_jgx_count.tolist() + miyuki_jgy_count.tolist() + miyuki_jgz_count.tolist() + miyuki_egx_count.tolist() + miyuki_egy_count.tolist() + miyuki_egz_count.tolist())

In [67]:
dohi_combined


Out[67]:
[5670,
 2618,
 37,
 7,
 10,
 0,
 2,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 7309,
 895,
 84,
 34,
 20,
 2,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 8281,
 53,
 8,
 2,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 2299,
 782,
 2918,
 1474,
 264,
 171,
 176,
 183,
 66,
 50,
 28,
 44,
 31,
 22,
 17,
 2,
 9,
 5,
 6,
 2,
 0,
 0,
 4,
 2,
 0,
 0,
 4,
 4,
 0,
 2,
 1735,
 3666,
 1776,
 234,
 70,
 69,
 49,
 41,
 37,
 32,
 30,
 13,
 35,
 47,
 243,
 315,
 155,
 12,
 3,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 3,
 0,
 0,
 0,
 7138,
 236,
 106,
 90,
 119,
 108,
 172,
 193,
 132,
 138,
 89,
 21,
 16,
 0,
 7,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 914,
 620,
 263,
 128,
 70,
 38,
 25,
 10,
 2,
 9,
 5,
 6,
 0,
 1,
 1,
 2,
 1,
 1,
 2,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1935,
 131,
 19,
 9,
 3,
 0,
 2,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1801,
 218,
 43,
 24,
 8,
 3,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 784,
 624,
 290,
 159,
 84,
 50,
 46,
 19,
 28,
 17,
 5,
 4,
 4,
 2,
 4,
 1,
 1,
 2,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1758,
 243,
 79,
 31,
 4,
 5,
 4,
 2,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1485,
 363,
 143,
 58,
 42,
 20,
 9,
 3,
 3,
 2,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

In [73]:
train_data = pd.DataFrame({'dohi': dohi_combined,'kohei':kohei_combined, 'nure':nure_combined, 'toshi':toshi_combined, 'yukita':yukita_combined, 'yamana':yamana_combined})
test_data = pd.DataFrame({'miyuki': miyuki_combined})

Creating a Classifier with Support Vector Classification


In [74]:
train_data.T.head()


Out[74]:
0 1 2 3 4 5 6 7 8 9 ... 350 351 352 353 354 355 356 357 358 359
dohi 5670 2618 37 7 10 0 2 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
kohei 7 81 1021 964 589 102 927 2600 1739 180 ... 0 0 0 0 0 0 0 0 0 0
nure 107 117 214 175 242 278 275 689 1340 1390 ... 0 0 0 0 0 0 0 0 0 1
toshi 4 6 5 7 4 15 6 8 20 99 ... 0 0 0 0 0 0 0 0 0 0
yamana 311 315 376 561 623 626 678 788 1606 1082 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 360 columns


In [75]:
label = pd.DataFrame({'label': [1,1,2,2,3,3]})

In [76]:
label.head()


Out[76]:
label
0 1
1 1
2 2
3 2
4 3

In [77]:
state = np.random.RandomState(1)
import sklearn.svm as svm
svc = svm.LinearSVC(random_state=state)

svc.fit(train_data.T, label)


C:\Users\footy_000\Anaconda\lib\site-packages\sklearn\preprocessing\label.py:125: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Out[77]:
LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=<mtrand.RandomState object at 0x0000000021A838D0>,
     tol=0.0001, verbose=0)

Testing the accuracy of the classifier with the additional test data


In [85]:
predicted = svc.predict(test_data['miyuki'])
predicted = pd.Series(predicted)
predicted


Out[85]:
0    1
dtype: int64

The above result shows that the classifier returned the value 1, which is the label for those whose English level were "LOW". The correct label for the test data, however, is 3.

Results & Discussion

  • The comparison of standard deviation showed that there significant difference in a Japanese person's body gesture when speaking in Engish and Japanese. The differences, however, did not necessarily confirm the first hypothesis. That is, there were cases when the body gestures were significantly larger when speaking in Japanese than in English (e.g. arm movement for subject no.2, head movement for subject no.4 and subject no.6).

  • The differences in arm movement were similar for fluent English speakers, which does confirm the second hypothesis, but at the same time the head movement for subject no.6 was significantly larger in Japanese which contradicts to the hypothesis.

  • For subject no.1, who was unconfident in English, showed significantly larger body gestures when speaking in English than in Japanese, but subject no.2, who was also unconfident in English, showed significantly larger head arm movement when speaking in Japanese, which also contradicts to the second hypothesis.

  • Although I was able to create a classifier, the result for the test data was not correct. Since I only had 6 learning data and 1 test data it is difficult to discuss about the accuracy of the classifier, but I presume that the test data was not predicted correctly because the subject apeared to be in a nervous state and lacked in body movement than usual. I chose Support Vector Classifier since it was based on LIBSVM, an opensource SVM library made in Taiwan, of which I was familiar with from my final undergraduate thesis, and because I was curious of how non-linerar SVM would deal with the collected data.

Conclusion

  • Although I was able to find a significant difference in the body gestures, I was not able to confirm the initial hypothesis that Japanese people would have larger body gestures when speaking in English than in Japanese.

  • I was also unable to confirm the hypothesis that English fluency has a negative correlation with the difference of body gestures between Japanese and English.

  • Improvements can be made by putting the accelerometer on both hands, for some subjects seemed to have a havit of making body gestures with a specific hand. In addition, the weight of the Arduino may have discouraged the subjects from moving their arms, so collecting data without distrubing the subjects is another issue that must be dealt with.


In [ ]: